An Effortless Way To Create Large-Scale Datasets For Famous Speakers
نویسندگان
چکیده
The creation of large-scale multimedia datasets has become a scientific matter in itself. Indeed, the fully-manual annotation of hundreds or thousands of hours of video and/or audio turns out to be practically infeasible. In this paper, we propose an extremly handy approach to automatically construct a database of famous speakers from TV broadcast news material. We then run a user experiment with a correctly designed tool that demonstrates that very reliable results can be obtained with this method. In particular, a thorough error analysis demonstrates the value of the approach and provides hints for the improvement of the quality of the dataset.
منابع مشابه
Evaluation of Updating Methods in Building Blocks Dataset
With the increasing use of spatial data in daily life, the production of this data from diverse information sources with different precision and scales has grown widely. Generating new data requires a great deal of time and money. Therefore, one solution is to reduce costs is to update the old data at different scales using new data (produced on a similar scale). One approach to updating data i...
متن کاملResearching (Non) Fluent L2 Speakers’ Oral Communication Deficiencies: A Psycholinguistic Perspective
Fluency in a second language (L2) involves a quintessentially cognitive processing system that operates quickly and effectively. The perceived importance of researching fluency through a psycholinguistic lens has motivated the related L2 research to resort to current cognitive speaking-specific models. This study, drawing on Levelt’s (1999a) psycholinguistic model, probed the deficiency sources...
متن کاملAudience Design in the Generation of References to Famous People
This paper seeks to fill a gap in existing computational models of the production of referring expressions, by addressing situations in which speakers have difficulty assessing what information is available to their audience. The paper describes a two-part experiment where speakers were given the name of a famous person and had to create a description that would enable a hearer to identify the ...
متن کاملAn Empirical Comparison of Distance Measures for Multivariate Time Series Clustering
Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative...
متن کاملLink Prediction using Network Embedding based on Global Similarity
Background: The link prediction issue is one of the most widely used problems in complex network analysis. Link prediction requires knowing the background of previous link connections and combining them with available information. The link prediction local approaches with node structure objectives are fast in case of speed but are not accurate enough. On the other hand, the global link predicti...
متن کامل